Covid - flexdashboard (documentation)
Packages
These are the packages for data wrangling, import, and visualization.
# dashboard
library(flexdashboard)
library(knitr)
library(DT)
library(janitor)
# theme
library(brotools)
library(hrbrthemes)
library(ggthemes)
library(bbplot)
# data wrangling
library(tidyverse)
library(lubridate)
library(socviz)
# data visualization
library(plotly)
library(gganimate)
library(gifski)
library(skimr)
# custom skim
my_skim <- skimr::skim_with(
numeric = skimr::sfl(p25 = NULL, p75 = NULL))
# map
library(rmapshaper)
library(sugarbag)Updates (version 0.1.4)
-
Wrangling steps have been moved into
helpers.Rscript (includes bothimportandwranglescripts from thecodefolder)
-
Replace all use of
worldwidewithglobal
-
Color background of global maps set to
"whitesmoke"
-
Global maps for
New Cases(orthographic) andRecovered(Mercator)
-
Change contents of
valueBox()s to sentence case
-
new colors in
valueBox()’s "#B22222","#EE4000","#EEE9E9","#00FF7F","#FFFFF0"-
Add
geofacetmaps and usecovdatapackage for comparing NYT data with COVID tracking project
-
Fix
geofacetunits onyaxes (non-scientific notation)
-
add
bbplottheme for animated graphs
Changes coming
-
Add county maps
- Include Rt for each county
-
Pretty up the
reactableon the global data
DATA: Johns Hopkins University’s CSSE
The COVID-19 data objects are loaded below from various sources.
Some of these are updated daily, others are pre-existing datasets for geographic locators or other variables to aid in visualizing/mapping.
Import CSSE time series data from Github
This imports data from the CSSEGISandData. These are in the csse_covid_19_time_series folder. These files are updated daily.
# fs::dir_ls("code/")
source("code/01.0-import-csse-time-series.R")The following info comes from the README in the time series repo folder:
## Time series summary (csse_covid_19_time_series)
This folder contains daily time series summary tables, including confirmed,
deaths and recovered. All data is read in from the daily case report.
The time series tables are subject to be updated if inaccuracies are identified
in our historical data. The daily reports will not be adjusted in these
instances to maintain a record of raw data.
Two time series tables are for the US confirmed cases and deaths, reported at
the county level. They are named `time_series_covid19_confirmed_US.csv`,
`time_series_covid19_deaths_US.csv`, respectively.
Three time series tables are for the global confirmed cases, recovered cases
and deaths. Australia, Canada and China are reported at the province/state
level. Dependencies of the Netherlands, the UK, France and Denmark are listed
under the province/state level. The US and other countries are at the country
level. The tables are renamed `time_series_covid19_confirmed_global.csv` and
`time_series_covid19_deaths_global.csv`, and
`time_series_covid19_recovered_global.csv`, respectively.
### Update frequency
Once a day around 23:59 (UTC).
### Deprecated warning
The files below were archived here, and will no longer be updated. With the
release of the new data structure, we are updating our time series tables to
reflect these changes.
Please reference `time_series_covid19_confirmed_global.csv` and
`time_series_covid19_deaths_global.csv` for the latest time series data.
`time_series_19-covid-Confirmed.csv`
`time_series_19-covid-Deaths.csv`
`time_series_19-covid-Recovered.csv`Export the raw csse covid 19 time series data (global)
All of the raw data files are in this list
names(covid_raw_data_files)#> [1] "GDPRaw" "TSConfirmedRaw" "TSConfirmedUSRaw" "TSDeathsRaw"
#> [5] "TSDeathsUSRaw" "TSRecoveredRaw"
I’ve dropped each datasets into a raw folder with a date stamp for safe keeping.
fs::dir_tree(paste0("data/raw/",
base::noquote(lubridate::today())),
recurse = FALSE)#> data/raw/2020-07-19
#> ├── 2020-07-19-GDPRaw.csv
#> ├── 2020-07-19-TSConfirmedRaw.csv
#> ├── 2020-07-19-TSConfirmedUSRaw.csv
#> ├── 2020-07-19-TSDeathsRaw.csv
#> ├── 2020-07-19-TSDeathsUSRaw.csv
#> └── 2020-07-19-TSRecoveredRaw.csv
Wrangling CSSE time series data
The code chunk below runs the script for the wrangling steps to create the data visualizations using the time series CSSE data.
# fs::dir_ls("code/")
source("code/02.0-wrangle-csse-time-series.R")The following steps were taken to wrangle the time series data:
- Convert wide to long (
Confirmed,Recovered,Deaths)
- first I converted
TSConfirmedRawdataset to long form, and converts theDatevariable tomdy()
- Create
WorldTSDataAllby joiningConfirmed,Recovered,Deaths
- This joins the
Confirmed,Recovered, andDeathstogether intoWorldTSDataAll
USTSDataAll= joinConfirmedUSandDeathsUS
- I want to mimic what I did with the
WorldTSDataAlland join these two together. I wantcountry_regionto just be namedcountry, andprovince_stateto just be namedstate. Export these files to processed folder
- Create
SumRegionDate
- this groups the
WorldTSDataAlldata bycountry_regionanddate, then summarizes theconfirmed_sum,recovered_sum, anddeaths_sumvariables. Then it creates a “new case” column withdplyr::lag()withconfirmed_sumandfiltersthedatetomax(date)
- create a most recent day from
SumRegionDatecalledrecent_day
- GDP Country Codes: create a smaller version of the
GDPRawdataset. I also rename some of theregionsinGdp2016
Gdp2016 %>%
dplyr::filter(str_detect(string = region, pattern = "Gambia, The"))#> # A tibble: 1 x 4
#> region code year country_region
#> <chr> <chr> <dbl> <chr>
#> 1 Gambia, The GMB 2016 Gambia
- Create
SumRegionDateCodesby joiningSumRegionDateandGdp2016
- Join the
SumRegionDateto theGdp2016data country. And because this is the first complete dataset I will be using for data visualizations, I will export this into thedata/processedfolder
fs::dir_ls(paste0("data/processed/",
base::noquote(lubridate::today())),
regexp = "SumRegionDate.csv")#> data/processed/2020-07-19/2020-07-19-SumRegionDate.csv
Dashboard layout
This section covers the data visualizations in the dashboard. Because the dashboard is built using multiple pages, with orientation: columns and vertical_layout: fill. Read more about this layout here.
Page 1: Global COVID-19 Data (Maps)
Page 1 is titled, Global COVID-19 Data (Maps). This contains the data from the WorldTSRecent dataset, the
Column 1: data-width=300 .bg-secondary
Column {data-width=250 .bg-secondary}
-----------------------------------------------------------------------
These are built using the following valueBox()s. The dataset WorldTSRecent is below:
rmarkdown::paged_table(
head(WorldTSRecent)
)This is a tibble with a single row for the most recent stats.
The valueBox() and prettyNum() functions display the objects below:
Total global confirmed cases as of…
WorldTSRecent$date#> [1] "2020-07-18"
WorldTSRecent$confirmed_sum#> [1] 14288689
### `r paste0("Total global confirmed cases as of ", WorldTSRecent$date)`
valueBox(prettyNum(WorldTSRecent$confirmed_sum, big.mark = ","), color = "tomato")New global cases as of…
WorldTSRecent$date#> [1] "2020-07-18"
WorldTSRecent$`New Case`#> [1] 233390
### `r paste0("New global cases as of ", WorldTSRecent$date)`
valueBox(prettyNum(WorldTSRecent$`New Case`, big.mark = ","), color = "sandybrown")Global deaths as of…
WorldTSRecent$date#> [1] "2020-07-18"
WorldTSRecent$deaths_sum#> [1] 602138
### `r paste0("Global deaths as of ", WorldTSRecent$date)`
valueBox(prettyNum(WorldTSRecent$deaths_sum, big.mark = ","), color = "gray50")Global recovered cases as of…
WorldTSRecent$date#> [1] "2020-07-18"
WorldTSRecent$recovered_sum#> [1] 7904159
### `r paste0("Global recovered cases as of ", WorldTSRecent$date)`
valueBox(prettyNum(WorldTSRecent$recovered_sum, big.mark = ","), color = "palegreen")Days since first confirmed case at…
WorldTSRecent$date#> [1] "2020-07-18"
case_no1#> [1] "2020-01-22"
days_passed#> Time difference of 179 days
### `r paste0("Days since first confirmed case at ", case_no1)`
valueBox(prettyNum(days_passed, big.mark = ","), color = "lightgoldenrodyellow")Column 2: data-width=700 .tabset
Column {data-width=700 .tabset}
-----------------------------------------------------------------------
This is the plotly::plot_geo() world map. The dataset that it requires is SumRegionDateCodes, and it’s available to view below:
rmarkdown::paged_table(
head(SumRegionDateCodes)
)Tab 1: Global Confirmed Cases (.tabset)
The visualization uses the plotly::plot_geo() function, which renders a full interactive globe!
# create recent_day
recent_day <- max(SumRegionDateCodes$date)
# Set country boundaries as light gray
line <- list(color = toRGB("#d1d1d1"), width = 0.2)
# create geo for map options
geo <- list(
bgcolor = "whitesmoke",
showframe = FALSE,
showcoastlines = FALSE,
# this is the globe option
projection = list(type = "orthographic"),
resolution = "100",
showcountries = TRUE,
showocean = TRUE,
showlakes = FALSE,
showrivers = FALSE)
geo_map_confirm_cases <- plotly::plot_geo() %>%
layout(
geo = geo,
paper_bgcolor = "whitesmoke",
title = paste0("Global COVID-19 confirmed cases as of ",
recent_day)) %>%
add_trace(
data = SumRegionDateCodes,
z = ~Confirmed,
color = ~Confirmed,
colors = "Reds",
text = ~country_region,
locations = ~code,
marker = list(line = line))
geo_map_confirm_casesTab 2: Global New Cases (.tabset)
# create recent_day
recent_day <- max(SumRegionDateCodes$date)
# Set country boundaries as light gray
line <- list(color = toRGB("#d1d1d1"), width = 0.2)
# create geo for map options
geo <- list(
bgcolor = "whitesmoke",
showframe = FALSE,
showcoastlines = FALSE,
# this is the globe option
projection = list(type = "orthographic"),
resolution = "100",
showcountries = TRUE,
showocean = TRUE,
showlakes = FALSE,
showrivers = FALSE)
geo_map_new_cases <- plotly::plot_geo() %>%
layout(
geo = geo,
paper_bgcolor = "whitesmoke",
title = paste0("Global COVID-19 new cases as of ",
recent_day)) %>%
add_trace(
data = SumRegionDateCodes,
z = ~`New Cases`,
color = ~`New Cases`,
colors = "Oranges",
text = ~country_region,
locations = ~code,
marker = list(line = line))
geo_map_new_casesTab 3: Global Deaths (.tabset)
This is now a Mercator map of the confirmed deaths from COVID.
# Set country boundaries as light gray
line <- list(color = toRGB("#d1d1d1"), width = 0.2)
# create geo for map options
# c("#B0E0E6", "#F0FFF0")
geo <- list(
oceancolor = "whitesmoke",
showframe = FALSE,
showcoastlines = FALSE,
# this is the mercator option
projection = list(type = 'Mercator'),
resolution = "100",
showcountries = TRUE,
showocean = TRUE,
showlakes = FALSE,
showrivers = FALSE)
geo_map_deaths <- plotly::plot_geo() %>%
layout(
geo = geo,
paper_bgcolor = "whitesmoke",
title = paste0("Global COVID-19 deaths as of ",
recent_day)) %>%
add_trace(
data = SumRegionDateCodes,
z = ~Deaths,
color = ~Deaths,
colors = "Greys",
text = ~country_region,
locations = ~code,
marker = list(line = line))
geo_map_deaths.tabset)
# Set country boundaries as light gray
line <- list(color = toRGB("#d1d1d1"), width = 0.2)
# create geo for map options
# c("#B0E0E6", "#F0FFF0")
geo <- list(
oceancolor = "aliceblue",
showframe = FALSE,
showcoastlines = FALSE,
# this is the mercator option
projection = list(type = 'Mercator'),
resolution = "100",
showcountries = TRUE,
showocean = TRUE,
showlakes = FALSE,
showrivers = FALSE)
geo_map_recovered <- plotly::plot_geo() %>%
layout(
geo = geo,
paper_bgcolor = "aliceblue",
title = paste0("World COVID-19 recovered cases as of ",
recent_day)) %>%
add_trace(
data = SumRegionDateCodes,
z = ~Recovered,
color = ~Recovered,
colors = "Greens",
text = ~country_region,
locations = ~code,
marker = list(line = line))
geo_map_recoveredTab 5: Dataset (Old)
This is the old table with DT.
SumRegionDateCodes %>%
dplyr::select(`Country region` = country_region,
`Country code` = code,
Date = date,
Confirmed,
`New Cases`,
Recovered,
Deaths) %>%
dplyr::arrange(desc(Confirmed)) %>%
DT::datatable(
rownames = FALSE,
fillContainer = TRUE,
options = list(
bPaginate = FALSE))Tab 5: Dataset (new)
It has been converted to a reactable.
library(reactable)
data <- SumRegionDateCodes %>%
dplyr::select(Country = country_region,
`Country code` = code,
Date = date,
Confirmed,
`New Cases`,
Recovered,
Deaths) %>%
dplyr::arrange(desc(Confirmed))
reactable::reactable(data,
defaultSorted = "Confirmed",
columns = list(
Confirmed = colDef(
name = "Confirmed",
defaultSortOrder = "desc",
format = colFormat(prefix = "")
),
Country = colDef(
name = "Country",
defaultSortOrder = "desc",
format = colFormat(separators = TRUE)
),
Date = colDef(
name = "Date",
defaultSortOrder = "desc",
format = colFormat(separators = TRUE)
# format = colFormat(percent = TRUE, digits = 1)
)
)
)To do: Convert this table to sparkline and add histograms.
Page 2: Global COVID-19 Cases (Graphs)
Column 1: data-width=300 .bg-secondary
Total global confirmed cases as of…
WorldTSRecent$date#> [1] "2020-07-18"
WorldTSRecent$confirmed_sum#> [1] 14288689
### `r paste0("Total global confirmed cases as of ", WorldTSRecent$date)`
valueBox(prettyNum(WorldTSRecent$confirmed_sum, big.mark = ","), color = "#B22222")New global cases as of…
WorldTSRecent$date#> [1] "2020-07-18"
WorldTSRecent$`New Case`#> [1] 233390
### `r paste0("New global cases as of ", WorldTSRecent$date)`
valueBox(prettyNum(WorldTSRecent$`New Case`, big.mark = ","), color = "#EE4000")Global deaths as of…
WorldTSRecent$date#> [1] "2020-07-18"
WorldTSRecent$deaths_sum#> [1] 602138
### `r paste0("Global deaths as of ", WorldTSRecent$date)`
valueBox(prettyNum(WorldTSRecent$deaths_sum, big.mark = ","), color = "#EEE9E9")Global recovered cases as of
WorldTSRecent$date#> [1] "2020-07-18"
WorldTSRecent$recovered_sum#> [1] 7904159
### `r paste0("Global recovered cases as of ", WorldTSRecent$date)`
valueBox(prettyNum(WorldTSRecent$recovered_sum, big.mark = ","), color = "#00FF7F")Days since first confirmed case at…
case_no1#> [1] "2020-01-22"
days_passed#> Time difference of 179 days
### `r paste0("Days since first confirmed case at ", case_no1)`
valueBox(prettyNum(days_passed, big.mark = ","), color = "#FFFFF0")Column 2: data-width=700 .tabset
Data inputs:
- Create
WorldTSDataAllDateandWorldTSDataRecent
WorldTSDataAllDateisWorldTSDataAllgrouped bydate
- This groups by the
datecolumn, the summarized theconfirmed,deaths, andrecovered
- Create
WorldTSDataAllDateLongfromWorldTSDataAllDate
- Now I restructure (pivot) to create
WorldTSDataAllDateLong. These data are exported into the processed data folder
Below I create two new graphs with gganimate using global and national data.
- Create World data (animated) =
WorldTSIncrementLong
- This requires an
incrementaldataset that calculates newcases,deaths, andrecoveredpatients withdplyr::lag()
status variable - dplyr::lag(status variable, 1)
- Here I filter the
WorldTSDataAllto only the US (WorldTSDataUS) and renameprovince_stateandcountry_region
country_region=="US"
- Create USA data (animated) =
USTSDataAllIncrementLong
- then I create an incremental dataset for US states by grouping by
state, calculating thelag(betweenmetric - dplyr::lag(metric)), then summarizing bydate
- Now I use the incremental dataset to animate the
ggplotusinggganimate
- finally I build the animation, first with
ggplot2, then pass the plot object togganimate::transition_reveal(date)and assign theggplot2::labs()
fs::dir_ls(paste0("data/processed/",
base::noquote(lubridate::today())),
regexp = "WorldTSData")#> data/processed/2020-07-19/2020-07-19-WorldTSDataAll.csv
#> data/processed/2020-07-19/2020-07-19-WorldTSDataAllDate.csv
#> data/processed/2020-07-19/2020-07-19-WorldTSDataAllDateLong.csv
#> data/processed/2020-07-19/2020-07-19-WorldTSDataUS.csv
Tab 1: Global COVID-19 Cases (Animated)
colors <- c("#B22222", # confirmed
"gray65", # deaths
"green4") # recovered
world_cum_cases <- WorldTSIncrementLong %>%
ggplot2::ggplot(mapping = aes(x = date,
y = increment,
group = case,
color = case)) +
ggplot2::geom_line(show.legend = FALSE) +
ggplot2::scale_y_continuous(labels = scales::label_number_si(accuracy = 1)) +
ggplot2::scale_color_manual(values = colors) +
ggplot2::geom_segment(aes(xend = max(date) - 1,
yend = increment),
linetype = "dashed",
size = 0.5,
colour = "grey75",
show.legend = FALSE) +
# this adds the labels to the graph
ggplot2::geom_text(aes(x = max(date) + 0.2,
label = case),
nudge_x = -7.5,
show.legend = FALSE,
hjust = 0) +
# this adds the theme/font
ggthemes::theme_few(base_size = 10, base_family = "Ubuntu") +
# set the coordinates
ggplot2::coord_cartesian(
xlim = c(min(WorldTSIncrementLong$date),
max(WorldTSIncrementLong$date) + 7),
ylim = c(max(0, min(WorldTSIncrementLong$increment)),
max(WorldTSIncrementLong$increment)),
clip = "off") +
ggplot2::theme(legend.position = c(0.1, 0.8),
# remove x axis title
axis.title.x = element_blank()) +
# remove guides
ggplot2::guides(size = FALSE) +
# add the points
ggplot2::geom_point(aes(size = increment),
alpha = 0.7,
show.legend = FALSE) +
# add scale size
ggplot2::scale_size(range = c(2, 10)) +
# this is the transition (x axis variable)
gganimate::transition_reveal(date) +
ggplot2::labs(title = "New Global COVID-19 Cases",
subtitle = "Date: {frame_along}",
y = "New daily cases",
x = "Date")
animate_world_cum_cases <- gganimate::animate(world_cum_cases,
nframes = 150,
fps = 10,
rewind = TRUE,
renderer = gifski_renderer(loop = TRUE))Now I save and render.
# and save
gganimate::anim_save(filename =
base::paste0(base::noquote(lubridate::today()),
"-animate_world_cum_cases.gif"),
animation = last_animation(),
path = "figs/")knitr::include_graphics(path =
base::paste0("figs/",
base::noquote(lubridate::today()),
"-animate_world_cum_cases.gif"))Tab 2: Global COVID-19 Cases (Cumulative)
# set colors
colors <- c("#B22222", # confirmed
"gray65", # deaths
"green4") # recovered
# font style
font_style <- list(
family = "Ubuntu",
size = 14,
color = 'black')
# create base chart
world_cum_point_chart <- WorldTSDataAllDateLong %>%
ggplot2::ggplot(aes(x = date,
y = cases,
color = status)) +
geom_point(size = 1, alpha = 2/5) +
scale_color_manual(values = colors) +
scale_y_continuous(labels = scales::label_number_si(accuracy = 0.1)) +
theme(
plot.margin = margin(0, 0, 0, 0, "pt"),
panel.background = element_rect(fill = "White"),
legend.position = "left",
axis.title = element_blank(),
axis.ticks = element_blank()) +
hrbrthemes::theme_ipsum_tw(plot_title_family = "Ubuntu") +
labs(title = "Global COVID-19 Cumulative Cases",
y = "Cases",
x = "Date",
color = " ")
# pass over to plotly
ggplotly(world_cum_point_chart) %>%
plotly::layout(legend = list(orientation = "h"),
font = font_style)Page 3: US COVID-19 Data (Maps)
This page is the US time-series COVID-19 data.
Column 1: data-width=300 .bg-secondary
Total US confirmed cases as of…
SumUSRecentCountry$date_max#> [1] "2020-07-18"
SumUSRecentCountry$confirmed_sum#> [1] 3711413
### `r paste0("Total US confirmed cases as of ", SumUSRecentCountry$date_max)`
valueBox(prettyNum(SumUSRecentCountry$confirmed_sum, big.mark = ","), color = "#B22222")New US cases as of …
SumUSRecentCountry$date_max#> [1] "2020-07-18"
SumUSRecentCountry$new_case_sum#> [1] 63698
### `r paste0("New US cases as of ", SumUSRecentCountry$date_max)`
valueBox(prettyNum(SumUSRecentCountry$new_case_sum, big.mark = ","), color = "#EE4000")US recovered cases as of…
SumUSRecentCountry$date_max#> [1] "2020-07-18"
SumUSRecentCountry$recovered_sum#> [1] 1122720
### `r paste0("US recovered cases as of ", SumUSRecentCountry$date_max)`
valueBox(prettyNum(SumUSRecentCountry$recovered_sum, big.mark = ","), color = "#EEE9E9")US deaths as of…
SumUSRecentCountry$date_max#> [1] "2020-07-18"
SumUSRecentCountry$deaths_sum#> [1] 140119
### `r paste0("US deaths as of ", SumUSRecentCountry$date_max)`
valueBox(prettyNum(SumUSRecentCountry$deaths_sum, big.mark = ","), color = "#00FF7F")Days since first confirmed case on…
us_first_case_day#> [1] "2020-01-22"
us_days_passed#> Time difference of 179 days
### `r paste0("Days since first confirmed case on ", us_first_case_day)`
valueBox(prettyNum(us_days_passed, big.mark = ","), color = "#FFFFF0")Column 2: data-width=700 .tabset
Tab 1: United States Confirmed Cases
This is a map of the COVID-19 data for the US using the county-level data from the USTSDataAll dataset, but I need to reduce this to the 51 states in the continental US.
The data for these are stored in the confirmed_us vector.
head(confirmed_us, 10)#> Alabama Alaska Arizona Arkansas California Colorado
#> 65234 1796 141265 32533 380745 39770
#> Connecticut Delaware Florida Georgia
#> 47893 13429 337569 139880
These are paired with the state.name and state.abb vectors from the maps package. Read more about how to create these in the plotly-r book.
us_map_layout <- list(
scope = 'usa',
lakecolor = "#3399FF",
projection = list(type = 'albers usa'))
plot_geo() %>%
add_trace(
z = confirmed_us,
text = state.name,
span = I(0),
locations = state.abb,
locationmode = 'USA-states') %>%
layout(geo = us_map_layout,
title = "Current US Confirmed Cases")The same map is created below using the total confirmed deaths.
head(deaths_us, 10)#> Alabama Alaska Arizona Arkansas California Colorado
#> 1286 18 2730 357 7702 1752
#> Connecticut Delaware Florida Georgia
#> 4396 523 4895 3169
Tab 2: United States Deaths
us_map_layout <- list(
scope = 'usa',
lakecolor = "#3399FF",
projection = list(type = 'albers usa'))
plot_geo() %>%
add_trace(
z = deaths_us,
text = state.name,
span = I(0),
locations = state.abb,
locationmode = 'USA-states') %>%
layout(geo = us_map_layout,
title = "Current US Deaths")Tab 2: United States New Cases
Below is a map of new US cases.
head(new_case_us, 10)#> Alabama Alaska Arizona Arkansas California Colorado
#> 2143 62 2742 771 7486 444
#> Connecticut Delaware Florida Georgia
#> 0 92 10328 4688
us_map_layout <- list(
scope = 'usa',
lakecolor = "#3399FF",
projection = list(type = 'albers usa'))
plot_geo() %>%
add_trace(
z = new_case_us,
text = state.name,
span = I(0),
locations = state.abb,
locationmode = 'USA-states') %>%
layout(geo = us_map_layout,
title = "New US Cases")Tab 4: US Map Data
library(reactable)
data <- SumUSDataMap %>%
dplyr::select(State = state,
Date = date,
Confirmed = confirmed_sum,
`New Cases` = new_case_sum,
Deaths = deaths_sum) %>%
dplyr::arrange(desc(Confirmed))
reactable::reactable(data,
defaultSorted = "Confirmed",
columns = list(
State = colDef(
name = "State",
format = colFormat(prefix = "")
),
Date = colDef(
name = "Date",
format = colFormat(separators = TRUE)
),
Confirmed = colDef(
name = "Confirmed",
defaultSortOrder = "desc",
format = colFormat(separators = TRUE)
),
`New Cases` = colDef(
name = "New Cases",
format = colFormat(separators = TRUE)
),
Deaths = colDef(
name = "Deaths",
format = colFormat(separators = TRUE)
)
)
)Page 4: US COVID-19 Data (Graphs)
Column 1: data-width=300 .bg-secondary
Total US confirmed cases as of…
SumUSRecentCountry$date_max#> [1] "2020-07-18"
SumUSRecentCountry$confirmed_sum#> [1] 3711413
### `r paste0("Total US confirmed cases as of ", SumUSRecentCountry$date_max)`
valueBox(prettyNum(SumUSRecentCountry$confirmed_sum, big.mark = ","), color = "#B22222")New US cases as of…
SumUSRecentCountry$date_max#> [1] "2020-07-18"
SumUSRecentCountry$new_case_sum#> [1] 63698
### `r paste0("New US cases as of ", SumUSRecentCountry$date_max)`
valueBox(prettyNum(SumUSRecentCountry$new_case_sum, big.mark = ","), color = "#EE4000")US recovered cases as of…
SumUSRecentCountry$date_max#> [1] "2020-07-18"
SumUSRecentCountry$recovered_sum#> [1] 1122720
### `r paste0("US recovered cases as of ", SumUSRecentCountry$date_max)`
valueBox(prettyNum(SumUSRecentCountry$recovered_sum, big.mark = ","), color = "#EEE9E9")US deaths as of…
SumUSRecentCountry$date_max#> [1] "2020-07-18"
SumUSRecentCountry$deaths_sum#> [1] 140119
### `r paste0("US deaths as of ", SumUSRecentCountry$date_max)`
valueBox(prettyNum(SumUSRecentCountry$deaths_sum, big.mark = ","), color = "#00FF7F")Days since first confirmed case on…
us_first_case_day#> [1] "2020-01-22"
us_days_passed#> Time difference of 179 days
### `r paste0("Days since first confirmed case on ", us_first_case_day)`
valueBox(prettyNum(us_days_passed, big.mark = ","), color = "#FFFFF0")Column 2 data-width=700 .tabset
Tab 1: US New Cases (Animated) (.tabset)
Now I can create the animated plot with ggplot2, specify the animation (gganimate::transition_reveal(date)), and pass it to the animation.
colors <- c("#B22222", # confirmed
"gray65", # deaths
"green4") # recovered
us_cum_cases <- USTSDataAllIncrementLong %>%
# format for the K y axis
dplyr::mutate(increment = increment/1000) %>%
ggplot2::ggplot(aes(x = date,
y = increment,
group = case,
color = case)) +
# add line
ggplot2::geom_line(show.legend = FALSE) +
ggplot2::scale_color_manual(values = colors) +
# add segment, no legend
ggplot2::geom_segment(aes(xend = max(date) - 1,
yend = increment),
linetype = "dashed",
size = 0.5,
color = "grey75",
show.legend = FALSE) +
# add text, no legend
ggplot2::geom_text(aes(x = max(date) + 0.2,
label = case),
hjust = 0,
nudge_x = -8.0,
show.legend = FALSE) +
# set theme
ggthemes::theme_few(base_size = 10, base_family = "Ubuntu") +
# set cartesian coordinates to min/max dates
ggplot2::coord_cartesian(xlim = c(min(USTSDataAllIncrementLong$date),
max(USTSDataAllIncrementLong$date) + 7),
clip = "off") +
# position the legend
ggplot2::theme(legend.position = c(0.1, 0.8),
# no x axis title
axis.title.x = element_blank()) +
# no guides
ggplot2::guides(size = FALSE) +
ggplot2::geom_point(aes(size = increment),
alpha = 0.7,
show.legend = FALSE) +
ggplot2::scale_size(range = c(2, 10)) +
# set transition
gganimate::transition_reveal(date) +
# assign labs
ggplot2::labs(title = "US COVID-19 Cases",
subtitle = "at date: {frame_along}",
y = "New Cases",
color = "Status",
x = "Date") +
ggplot2::scale_y_continuous(label = scales::unit_format(unit = "K"))
animated_us_cum_cases <- gganimate::animate(us_cum_cases,
nframes = 150,
fps = 10,
rewind = TRUE,
renderer = gifski_renderer(loop = TRUE))# and save
gganimate::anim_save(filename =
base::paste0(base::noquote(lubridate::today()),
"-animated_us_cum_cases.gif"),
animation = last_animation(),
path = "figs/")knitr::include_graphics(path =
base::paste0("figs/",
base::noquote(lubridate::today()),
"-animated_us_cum_cases.gif"))Column 3: data-width=700 .tabset
These are the geofacet plots, built with data from the covdata package.
The covdata package
I’ll be using data from the covdata package by Kieran Healy. The goal with this package is to build an set of graphs that the user can select an input (selectInput()) from a list of metrics, and see that metric reflected across all 50 states.
The datasets I’ll be using are covus and nytcovstate. The script below imports and wrangles these data.
# fs::dir_ls("code")
source("code/01.1-import-wrangle-geofacet.R")The covus data is a tidy dataset, with a date for each day, and each metric in the measure variable.
rmarkdown::paged_table(
Covus %>% head())A cleaner version of the measure variable is stored in the measure_label variable.
rmarkdown::paged_table(
dplyr::distinct(Covus, measure_label))rmarkdown::paged_table(
dplyr::distinct(Covus, measure))Now I add the necessary date variables grouped by state, remove the regions not included in the geofacet, and put these in a dataset called MapCovus.
rmarkdown::paged_table(
MapCovus %>% head())I then created a table for positive tests called PosMapCovus.
rmarkdown::paged_table(
PosMapCovus %>% head())This was indexed on Positive Test Metric and Positive Test Value.
rmarkdown::paged_table(
TidyPosMapCovus %>% head())The NYTCovState data has cases and deaths.
rmarkdown::paged_table(
NYTCovState %>% head())The NYTCovState was joined to TidyPosMapCovus to create the TidyCovDeathData data, which I use to compare deaths between NYT and COVID-tracking project.
rmarkdown::paged_table(
TidyCovDeathData %>% head())I’ve covered how to create the geofacet graphs in this storybench post.
Tab 2: Positive Tests
These are the positive tests from the COVID tracking project data.
geofacet_pos <- PosMapCovus %>%
# adjust for scales on y axis
dplyr::mutate(`positive tests` = `positive tests`/1000) %>%
# plot this with new adjusted positive tests
ggplot2::ggplot(aes(x = days_elapsed,
y = `positive tests`,
group = date)) +
geom_col(alpha = 2/10,
linetype = 0) +
ggplot2::geom_line(data = TidyPosMapCovus,
mapping = aes(x = days_elapsed,
y = `Positive Test Value`/1000,
group = `Positive Test Metric`,
color = `Positive Test Metric`),
show.legend = TRUE) +
geofacet::facet_geo( ~ state,
grid = "us_state_grid2",
scales = "free_y") +
ggplot2::labs(title = "US positive COVID tests (7-day rolling average)",
subtitle = paste0("Between ",
min(PosMapCovus$date),
" and ",
max(PosMapCovus$date)),
caption = "SOURCE: https://covidtracking.com/",
y = "New Positive Tests",
x = "Days Elapsed") +
scale_y_continuous(label = scales::unit_format(unit = "K")) +
ggthemes::theme_tufte(base_size = 10, base_family = "Ubuntu") +
ggplot2::theme(axis.text.x = element_text(angle = 315),
legend.position = "top")
geofacet_posggsave(plot = geofacet_pos,
filename = "figs/geofacet_pos.png",
device = "png",
dpi = "retina",
width = 16,
height = 10,
units = "in",
limitsize = FALSE)Tab 3: US Cases
I want to compare the NYT and COVID-19 tracking datasets (positive tests vs. NYT cases). This can be accomplished using the TidyCovCaseData which has both cases from the NYT dataset, and the positive measure from the COVID tracking project dataset.
geofacet_cases <- TidyCovCaseData %>%
# adjust for y scale formatting (`K`)
dplyr::mutate(`Cases Value` = `Cases Value`/1000) %>%
ggplot2::ggplot(aes(x = days_elapsed,
y = `Cases Value`,
group = `Cases Key`,
color = `Cases Key`)) +
ggplot2::geom_line(show.legend = TRUE) +
geofacet::facet_geo( ~ state,
grid = "us_state_grid2",
scales = "free_y") +
ggplot2::labs(title = "US COVID Cases",
subtitle = paste0("Between ",
min(TidyCovCaseData$date),
" and ",
max(TidyCovCaseData$date)),
y = "New Cases",
x = "Days Elapsed",
caption = "https://covidtracking.com | https://github.com/nytimes/covid-19-data") +
ggplot2::scale_y_continuous(label = scales::unit_format(unit = "K")) +
ggplot2::theme_minimal() +
ggplot2::theme(axis.text.x = element_text(angle = 315),
legend.position = "top")
geofacet_casesggsave(plot = geofacet_cases,
filename = "figs/geofacet_cases.png",
device = "png",
dpi = "retina",
width = 16,
height = 10,
units = "in",
limitsize = FALSE)Tab 4: US Deaths
Finally, I can repeat this process, but compare deaths between the NYT and COVID-tracking datasets (which later I can turn into a selectInput).
geofacet_deaths <- TidyCovDeathData %>%
ggplot2::ggplot(aes(x = days_elapsed,
y = `Death Value`,
group = `Death Key`,
color = `Death Key`)) +
ggplot2::geom_line(show.legend = TRUE) +
geofacet::facet_geo( ~ state,
grid = "us_state_grid2",
scales = "free_y") +
ggplot2::labs(title = "US COVID deaths",
subtitle = paste0("Between ",
min(DeathsMapCovus$date),
" and ",
max(DeathsMapCovus$date)),
y = "New Positive Tests",
x = "Days Elapsed",
caption = "https://covidtracking.com | https://github.com/nytimes/covid-19-data") +
ggthemes::theme_tufte(base_size = 10, base_family = "Ubuntu") +
ggplot2::theme(axis.text.x = element_text(angle = 315),
legend.position = "top")
geofacet_deathsggsave(plot = geofacet_deaths,
filename = "figs/geofacet_deaths.png",
device = "png",
dpi = "retina",
width = 16,
height = 10,
units = "in",
limitsize = FALSE)Page 5: Sources
Data Sources:
The time series data comes from the Johns Hopkins University Center for Systems Science and Engineering (JHU CSSE) COVID-19 dashboard data. You can find the time series files here. These files are updated daily with new cases, recovered, and deaths.
The country codes come from the GDP country codes in 2016. Will be updated as needed.
The covdata package from Kieran Healy contains data from the NYT database, COVID tracking project, and others.